Skip to content

test: add tests for build pipeline scripts#28

Merged
shaypal5 merged 2 commits into
mainfrom
add-build-pipeline-tests
Apr 30, 2026
Merged

test: add tests for build pipeline scripts#28
shaypal5 merged 2 commits into
mainfrom
add-build-pipeline-tests

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

  • Adds 37 tests for the build pipeline scripts (scripts/build_v5_snapshot.py and scripts/validate_lead_scoring_dataset.py), closing Add tests for build pipeline scripts (build_v4_snapshot.py, validate_v4_dataset.py) #23
  • 31 unit tests in tests/scripts/test_build_v5_snapshot.py covering all pipeline functions: derive_binary_features(), cap_expected_acv(), rename_and_select(), subsample(), inject_missingness(), boost_leakage_trap()
  • 6 CLI integration tests in tests/scripts/test_validate_cli.py covering exit codes, --out-json, --emit-release-snippet, --enforce-1000, and missing args

Key edge cases tested

  • subsample() with insufficient positives/negatives (warns and adjusts)
  • inject_missingness() rate bounds (<20%), source-conditional variation (SDR > inbound), columns not in spec unaffected
  • boost_leakage_trap() only affects converted leads, monotonically increases values
  • All pipeline functions: input immutability, determinism given seed
  • rename_and_select() raises on missing columns

Test discovery

Scripts live in scripts/ (not leadforge/), so they're imported via importlib.util.spec_from_file_location() — no sys.path manipulation needed.

Test plan

  • All 700 tests pass (663 existing + 37 new)
  • ruff check and ruff format clean
  • Pre-commit hooks pass

🤖 Generated with Claude Code

Closes #23

…idate CLI)

Closes #23. Adds unit tests for all pipeline functions in
scripts/build_v5_snapshot.py and CLI integration tests for
scripts/validate_lead_scoring_dataset.py.

Coverage: subsample edge cases (insufficient positives/negatives),
inject_missingness rate bounds and source-conditional variation,
derive_binary_features, cap_expected_acv, rename_and_select,
boost_leakage_trap, and 6 CLI entrypoint tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 16:45
@shaypal5 shaypal5 added the type: test Test additions or fixes label Apr 30, 2026
@shaypal5 shaypal5 self-assigned this Apr 30, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated coverage for the v5 build/validation scripts by introducing unit tests for the build pipeline functions and subprocess-based CLI integration tests for the validator entrypoint.

Changes:

  • Add unit tests for scripts/build_v5_snapshot.py pipeline steps (feature derivation, ACV capping, renaming/selection, subsampling, missingness injection, leakage trap boost).
  • Add CLI integration tests for scripts/validate_lead_scoring_dataset.py covering key flags and exit codes.
  • Update .agent-plan.md to record completion of the test work.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/scripts/test_build_v5_snapshot.py Unit tests for v5 snapshot pipeline functions (determinism, immutability, edge cases).
tests/scripts/test_validate_cli.py Subprocess CLI integration tests for validator entrypoint and flags.
tests/scripts/init.py Marks tests/scripts as a package (test organization).
.agent-plan.md Notes the addition of build/validation script tests in the plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/scripts/test_validate_cli.py Outdated
Comment on lines +98 to +156
def test_valid_csv_exit_code_zero(self, tmp_path):
csv_path = _make_valid_csv(tmp_path)
result = subprocess.run( # noqa: S603
[sys.executable, str(_SCRIPT_PATH), "--csv", str(csv_path)],
capture_output=True,
text=True,
timeout=120,
)
assert result.returncode == 0, f"stdout: {result.stdout}\nstderr: {result.stderr}"

def test_invalid_csv_exit_code_one(self, tmp_path):
csv_path = _make_invalid_csv(tmp_path)
result = subprocess.run( # noqa: S603
[sys.executable, str(_SCRIPT_PATH), "--csv", str(csv_path)],
capture_output=True,
text=True,
timeout=120,
)
assert result.returncode == 1

def test_out_json_flag(self, tmp_path):
csv_path = _make_valid_csv(tmp_path)
json_path = tmp_path / "report.json"
subprocess.run( # noqa: S603
[
sys.executable,
str(_SCRIPT_PATH),
"--csv",
str(csv_path),
"--out-json",
str(json_path),
],
capture_output=True,
text=True,
timeout=120,
)
# JSON report should be written regardless of pass/fail
assert json_path.exists()
report = json.loads(json_path.read_text())
assert "passed" in report
assert "checks" in report

def test_emit_release_snippet_flag(self, tmp_path):
csv_path = _make_valid_csv(tmp_path)
result = subprocess.run( # noqa: S603
[
sys.executable,
str(_SCRIPT_PATH),
"--csv",
str(csv_path),
"--emit-release-snippet",
],
capture_output=True,
text=True,
timeout=120,
)
# Snippet should be emitted regardless of pass/fail
assert "RELEASE SNIPPET" in result.stdout

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CLI tests invoke the full validator in multiple subprocesses. Since validate_dataset() runs a scikit-learn baseline plus multi-seed leakage-trap evaluation (default 10 seeds), each invocation does a non-trivial amount of work. Consider consolidating some flag assertions into fewer subprocess runs (e.g., combine --out-json and --emit-release-snippet with the happy-path run) to keep the suite fast and reduce CI time.

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +218
# All available positives should be included
assert result["converted"].sum() <= 10

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_insufficient_positives, the assertion result["converted"].sum() <= 10 doesn’t actually verify the intended behavior (“all available positives should be included”). A bug that drops positives (e.g., returning 0 positives) would still satisfy <= 10. Consider asserting that the number of positives in the result equals the number of positives available in the input (or at least equals min(n_pos_needed, positives_available)).

Suggested change
# All available positives should be included
assert result["converted"].sum() <= 10
positives_available = int(df["converted"].sum())
n_pos_needed = int(100 * 0.50)
# All available positives should be included
assert result["converted"].sum() == min(n_pos_needed, positives_available)

Copilot uses AI. Check for mistakes.
Comment thread tests/scripts/test_build_v5_snapshot.py Outdated
Comment on lines +224 to +226
subsample(df, rng, n=100, target_rate=0.10) # need 90 negatives
captured = capsys.readouterr()
assert "WARNING" in captured.err

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_insufficient_negatives only checks that a warning is emitted, but doesn’t assert anything about the returned sample composition/size after the adjustment. To make this regression-proof, consider asserting the output length and that the number of negatives equals the available negatives (or the adjusted target) when negatives are insufficient.

Suggested change
subsample(df, rng, n=100, target_rate=0.10) # need 90 negatives
captured = capsys.readouterr()
assert "WARNING" in captured.err
result = subsample(df, rng, n=100, target_rate=0.10) # need 90 negatives
captured = capsys.readouterr()
assert "WARNING" in captured.err
# With 200 rows at 95% conversion, only 10 negatives are available.
# Requesting 100 rows at 10% conversion would require 90 negatives,
# so the sample must be adjusted to include all 10 negatives and the
# corresponding 10 positives, for a total of 20 rows.
assert len(result) == 20
negative_count = (result["converted"] == 0).sum()
positive_count = (result["converted"] == 1).sum()
assert negative_count == 10
assert positive_count == 10

Copilot uses AI. Check for mistakes.
Comment thread tests/scripts/test_validate_cli.py Outdated
Comment on lines +157 to +171
def test_enforce_1000_flag_fails_on_small(self, tmp_path):
csv_path = _make_valid_csv(tmp_path, n=200)
result = subprocess.run( # noqa: S603
[
sys.executable,
str(_SCRIPT_PATH),
"--csv",
str(csv_path),
"--enforce-1000",
],
capture_output=True,
text=True,
timeout=120,
)
assert result.returncode == 1

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_enforce_1000_flag_fails_on_small only asserts a non-zero exit code. If the 200-row fixture ever fails another validation check (e.g., baseline AUC), this test would still pass even if --enforce-1000 stopped affecting row-count enforcement. Consider asserting the failure reason via stdout (e.g., that the summary contains a failing row_count check / expected-row message), or running once without --enforce-1000 to prove it passes without enforcement.

Copilot uses AI. Check for mistakes.
…oundary tests

- Extract shared `make_v5_dataset()` builder into `tests/conftest.py`,
  eliminating duplicate synthetic data generators across 3 test files
- Delete 4 tautological missingness tests (trivially true at n=1000)
  and the weak `test_rows_come_from_input` (O(n^2), tests pandas internals)
- Strengthen `test_insufficient_negatives` to assert output composition
- Add `@pytest.mark.parametrize` for subsample target rate/seed combos
  and missingness rate bounds across multiple seeds
- Add boundary tests: `subsample` with n > input, `boost_leakage_trap`
  with zero converted leads, `inject_missingness` with small n and
  unknown lead sources, `rename_and_select` with extra columns
- Use session-scoped fixtures for CLI tests (avoid regenerating CSV 4x)
- Add precondition assertion on fixture conversion rate so sigmoid
  drift produces a clear error message instead of mysterious CLI failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #28 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: tests/scripts/test_validate_cli.py
URL: https://github.com/leadforge-dev/leadforge/pull/28#discussion_r3169531336
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    These CLI tests invoke the full validator in multiple subprocesses. Since `validate_dataset()` runs a scikit-learn baseline plus multi-seed leakage-trap evaluation (default 10 seeds), each invocation does a non-trivial amount of work. Consider consolidating some flag assertions into fewer subprocess runs (e.g., combine `--out-json` and `--emit-release-snippet` with the happy-path run) to keep the suite fast and reduce CI time.

## COPILOT-2
Location: tests/scripts/test_build_v5_snapshot.py:231
URL: https://github.com/leadforge-dev/leadforge/pull/28#discussion_r3169531378
Root author: copilot-pull-request-reviewer

Comment:
    In `test_insufficient_positives`, the assertion `result["converted"].sum() <= 10` doesn’t actually verify the intended behavior (“all available positives should be included”). A bug that drops positives (e.g., returning 0 positives) would still satisfy `<= 10`. Consider asserting that the number of positives in the result equals the number of positives available in the input (or at least equals `min(n_pos_needed, positives_available)`).
    ~~~suggestion
            positives_available = int(df["converted"].sum())
            n_pos_needed = int(100 * 0.50)
            # All available positives should be included
            assert result["converted"].sum() == min(n_pos_needed, positives_available)
    ~~~

## COPILOT-3
Location: tests/scripts/test_build_v5_snapshot.py
URL: https://github.com/leadforge-dev/leadforge/pull/28#discussion_r3169531402
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `test_insufficient_negatives` only checks that a warning is emitted, but doesn’t assert anything about the returned sample composition/size after the adjustment. To make this regression-proof, consider asserting the output length and that the number of negatives equals the available negatives (or the adjusted target) when negatives are insufficient.
    ~~~suggestion
            result = subsample(df, rng, n=100, target_rate=0.10)  # need 90 negatives
            captured = capsys.readouterr()
            assert "WARNING" in captured.err
            # With 200 rows at 95% conversion, only 10 negatives are available.
            # Requesting 100 rows at 10% conversion would require 90 negatives,
            # so the sample must be adjusted to include all 10 negatives and the
            # corresponding 10 positives, for a total of 20 rows.
            assert len(result) == 20
            negative_count = (result["converted"] == 0).sum()
            positive_count = (result["converted"] == 1).sum()
            assert negative_count == 10
            assert positive_count == 10
    ~~~

## COPILOT-4
Location: tests/scripts/test_validate_cli.py
URL: https://github.com/leadforge-dev/leadforge/pull/28#discussion_r3169531418
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `test_enforce_1000_flag_fails_on_small` only asserts a non-zero exit code. If the 200-row fixture ever fails another validation check (e.g., baseline AUC), this test would still pass even if `--enforce-1000` stopped affecting row-count enforcement. Consider asserting the failure reason via stdout (e.g., that the summary contains a failing `row_count` check / expected-row message), or running once without `--enforce-1000` to prove it passes without enforcement.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25178444170 attempt 1
Comment timestamp: 2026-04-30T17:00:07.780916+00:00
PR head commit: f087d764ce3e7ad8812736f062ed47c62c95f5a6

@shaypal5 shaypal5 merged commit 3c0b911 into main Apr 30, 2026
6 checks passed
@shaypal5 shaypal5 deleted the add-build-pipeline-tests branch April 30, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: test Test additions or fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tests for build pipeline scripts (build_v4_snapshot.py, validate_v4_dataset.py)

2 participants